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A  Layered  Approach:  Levels 
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How  Can  We  Detect  and  Identify  BW  Agents? 

Genotype  markers  known  to  show  variation 

-  Fixed  species  specific  variants,  previously  identified 

-  Rapid  detection  of  a  small  number  of  sites 

-  Example:  Real-Time  PCR  (Confirmatory  Lab) 

DNA  sequence  regions/genomes  of  interest 

-  Maximally  informative: 

The  sequence  is  the  genotype ! 

-  Detects  common  and  rare  variants 

-  Strain  identification/origin  (Definitive  Lab) 

The  future  deteetion  and  identifleation  of 
BW  agents  will  inereasingly  depend 
upon  DNA  sequencing  technologies 
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Design  of  Resequencing  Arrays 
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□  Masked  Pixels  A5O5G  Possible  SNPs  T  Reference  base 


n  Raw  Data  Pixels 


Resequencing  Assay 


Long  PCR/Whole 
Genome  Amplification 


PCR  products 
pooled  by  individual; 


DNAse  I  treated  dNA  fragments 


Analyzed  by  ABACUS 
to  detect  variation 


Tagged  fragments  hybridized 
to  an  oligonucleotide  array; 
stained  with  streptavidin  phycoerythrin 
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Resequencing  B.  anthracis 


•  29.5  kb  of  unique 
sequence  per  chip. 

•  Each  array  has  ~320,000 
features. 

•  Forward  and  reverse 
strands  tiled. 

•  1  design,  6  LPCR  assays 

•  pXOl,  pX02,  Main 
Chromosome;  All  or 
part  of  32  genes 

•  lef,  pag,  cap,  vrr,  rpoB, 
sasB 


CACTGTCCGGGTACTCGTAGGGCAG 
CACTGTCCGGGTCCTCGTAGGGCAG 
CACTGTCCGGGTGCTCGTAGGGCAG 
CACTGTCCGGGT  CTCGTAGGGCAG 

ACTGTCCGGGTTATCGTAGGGCAGT 
ACTGTCCGGGTT  TCGTAGGGCAGT 
ACTGTCCGGGTTGTCGTAGGGCAGT 
ACTGTCCGGGTTTTCGTAGGGCAGT 


❖  How  certain  are  we  of  this 
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AB ACU S :  An  Automated  Statistical 
Algorithm  for  Base/Genotype  Calling 


•  Within  any  given  feature,  florescence  intensities  of 
individual  pixels  are  assumed  to  be  independent  and 
identically  distributed  Gaussian  variables. 

•  Forward  and  reverse  strands  are  treated  as  independent 
replicates  (with  different  parameters). 

•  All  parameters  are  fit  by  maximum  likelihood. 

•  5  models  for  haploid  data  (null,A,C,G,T). 

•  11  models  for  diploid  data  (null,  AA,CC,GG,TT,AC,  AG, 
AT,  CG,  CT,  GT). 

•  Neighborhood  quality  rules  are  used. 


ABACUS  Assigns  Quality  Scores  to  Each 

Base/Genotype  Call 

•  A  Quality  Score,  the  difference  between  the 
logjo  likelihood  of  the  best  fitting  and 
second  best  fitting  model,  is  assigned  to 
each  genotype. 

•  Information  from  both  the  forward  and 
reverse  strands  is  incorporated  into  the 
Quality  Score. 

•  Genotypes  inferred  only  when  a  Quality 
Score  threshold  is  reached. 

For  more  detail,  see  Cutler,  DJ,  Zwick,  ME  et  al. 

Genome  Res.  2001  11:  1913-1925 


Distribution  of  Quality  Scores  (Human 

Data) 


Haploid  ABACUS  Base  Calls 
Are  Highly  Accurate  (QS>30) 

•LPCR  fragments  hydrosheared 
•Individual  8  from  FMRl 
•Subcloned  with  end-repair  into  PUC  Library 
•Single  Pass  sequenced  with  Ml 3  primers 
•At  least  6x 


•17,423  bp  with  at  least  6x  coverage,  all  identical  to  ABACUS  calls 
•  At  2x  coverage,  an  additional  4,081  bp,  with  1  difference  from 


ABACUS  calls 


ABACUS  Genotype  Calls  Are 
Highly  Repeatable 

•  Haploid 

-  0  differences  /  841,236  sites  (QS>30) 

•  Diploid 

-  0  differences  /  812,944  homozygotes  (QS>30) 

-  0  differences  /  351  heterozygotes  (QS  >  30) 

•  Implies  a  phred  score  of  at  least  54 


B.  anthracis  Resequencing  Experiment 

•  Chips  Hybridized  and  Scanned:  114 

Successful:  112 
Experimental  Failure:  2 

•  B.  anthracis  Isolates  Analyzed:  59 

Replicated:  53  (106  chips) 

Single  Analysis:  6  (6  chips) 


Microarrays  Can  Generates  Vast  Amounts 

of  Sequence  Data 

•  Raw  Sequence  Generated 

Bases  Called:  3,052,254 
Total  Possible  Bases:  3,271,744 

Call  Rate:  93.3% 

•  Variant  Sites  Diseovered 

38  Single  Nueleotide  Polymorphisms  (SNPs) 

16  of  38  SNPs  singletons 
22  SNPs  found  more  than  once 


Anthrax  Resequencing  is  Highly  Replicable 


Total  Comparisons 

1,420,583 

Total  Bases  Called 

2,897,098 

Total  Discrepancies 

1 

•  Suggests  error  rates  of  less  than  1  per  million 

•  Quality  Seore  Threshold:  3 1 

•  Sequences  on  chip:  34.7%  GC  Content 


How  different  are  two  B.  anthracis  isolates? 

•Variation  Estimates 

•Tajima’s  Estimate  of  Theta:  1.6  X  10'"^ 
•Watterson’s  Estimate  of  Theta:  2.9  X  10'"^ 

•Two  Isolates  of  B.  anthracis  are  expected  to 

differ  at  between: 

-924  (Tajima) 
and 

_ -  1606  (Watterson) _ 

Resequencing  can  uniquely  identify 
B.  anthracis  isolates 
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Assessing  ABACUS  Performance 

•  Replicability:  Comparison  of 
haploid/diploid  replicates  by  independent: 

-  PCR  amplification  of  genomic  DNA 

-  Manufacture  of  resequencing  arrays  (distinct  wafers) 

-  Hybridization  of  amplified  DNA  to  chips 

-  ABACUS  genotype  calls 

•  Accuracy:  Independent  Genotyping/DNA 
Sequencing 

All  genotyping  technologies  should  be 
assessed  using  these  criteria 


Diploid  ABACUS  Genotype  Calls 
Are  Highly  Accurate  (QS>30) 


•  Homozygous  genotypes 

-  0  differences  / 1,515  genotypes  (100%  correct) 

•  Heterozygous  genotypes 

-  3  differences  /  423  genotypes  (99.3%  correct) 

•Two  of  the  three  differences  were  in  a  single  LPCR  fragment 
All  three  differences  were  at  high  frequency  sites 
Chips  called  heterozygote,  sequencing  called  homozygote 


Probable  Cause:  Sample  Mixing 


